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A natural experiment is a type of observational study in which 
treatment assignment, though not randomized by the investigator, 
is plausibly close to random. A process that assigns treatments in 
a highly nonrandom, inequitable manner may, in rare and brief mo¬ 
ments, assign aspects of treatments at random or nearly so. Isolating 
those moments and aspects may extract a natural experiment from 
a setting in which treatment assignment is otherwise quite biased, 
far from random. Isolation is a tool that focuses on those rare, brief 
instances, extracting a small natural experiment from otherwise use¬ 
less data. We discuss the theory behind isolation and illustrate its 
use in a reanalysis of a well-known study of the effects of fertility 
on workforce participation. Whether a woman becomes pregnant at 
a certain moment in her life and whether she brings that pregnancy 
to term may reflect her aspirations for family, education and career, 
the degree of control she exerts over her fertility, and the quality of 
her relationship with the father; moreover, these aspirations and re¬ 
lationships are unlikely to be recorded with precision in surveys and 
censuses, and they may confound studies of workforce participation. 
However, given that a women is pregnant and will bring the preg¬ 
nancy to term, whether she will have twins or a single child is, to a 
large extent, simply luck. Given that a woman is pregnant at a certain 
moment, the differential comparison of two types of pregnancies on 
workforce participation, twins or a single child, may be close to ran¬ 
domized, not biased by unmeasured aspirations. In this comparison, 
we find in our case study that mothers of twins had more children 
but only slightly reduced workforce participation, approximately 5% 
less time at work for an additional child. 
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1. Constructing natural experiments. 

1.1. Natural experiments. Natural experiments are a type of observa¬ 
tional study, that is, a study of the effects caused by treatments when 
random assignment is infeasible or unethical. What distinguishes a natural 
experiment from other observational studies is the emphasis placed upon 
finding unusual circumstances in which treatment assignment, though not 
randomized, seems to resemble randomized assignment in that it is haphaz¬ 
ard, not the result of deliberation or considered judgement, not confounded 
by the typical attributes that determine treatment assignment in a partic¬ 
ular empirical field. The literature on natural experiments spans the health 
and social sciences; see, for instance, Arpino and Aassve (2013), Imai et al. 
(2011), Meyer (1995), Rutter (2007), Sekhon and Titiunik (2012), Susser 
(1981) and Vandenbroucke (2004). 

Traditionally, natural experiments were found, not built. In one sense, 
this seemed inevitable: one needs to find haphazard treatment assignment 
in a world that typically assigns treatments in a biased fashion, often as¬ 
signing treatments with a view to achieving an objective. There is, however, 
substantial scope for constructing natural experiments. When treatment as¬ 
signment is biased, there may be aspects of treatment assignment, present 
only briefly, that are haphazard, close to random. The key to constructing 
natural experiments is to isolate these transient, haphazard aspects from 
typical treatment assignments that are biased. If brief haphazard aspects of 
treatment assignment can be isolated from the rest, in the isolated portion 
it is these haphazard elements that are decisive. This is analogous to a lab¬ 
oratory in which a treatment is studied in isolation from disruptions that 
would obscure the treatment’s effects. Laboratories are built, not found. 

1.2. A natural experiment studying effects of fertility on workforce partic¬ 
ipation. Does having a child reduce a mother’s participation in the work¬ 
force? If it does, what is the magnitude of the reduction? The question is 
relevant to individuals planning families and careers and to legislators and 
managers who determine policies related to fertility, such as family leaves. 
A major barrier to answering this question is that, for many if not most 
women, decisions about fertility, education and career are highly intercon¬ 
nected, and each decision has consequences for the others. Here we follow 
Angrist and Evans (1998) and seek to determine if there is some source of 
variation in fertility that does not reflect career plans and is just luck. Al¬ 
though a woman has the ability to influence the timing of her pregnancies, 
given that she is pregnant at a particular age, she has much less influence 
about whether she will have a boy or a girl, whether she will have a single 
child or twins—to a large extent, that is just luck. More precisely, that a 
woman is pregnant at a certain moment in her life may be indicative of 
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her unrecorded plans and aspirations for education, family and career, but 
conditionally given that she is pregnant at that moment, the birth outcome, 
a boy or a girl or twins, is unlikely to indicate much about her plans and 
aspirations. 

We focus here on the haphazard contrast most likely to shift the total 
number of children, namely, a comparison of similar women, one with a 
twin at her £;th birth, the other with children of mixed sex at her kth birth 
since, as Angrist and Evans (1998) noted, many women or families in the 
US prefer to have children of both sexes, rather than just boys or just girls, 
that is, a third child is seen in data to be more common if the first two 
children have the same sex. While we could compare women having twins 
with women having a single child whose sex is the same as her first child, we 
focus on comparing women having twins with women having a single child 
whose sex is different from her first since the first woman may end up with 
one more child than she intended, whereas the other woman will, at least, 
not have additional children simply to have one of each sex. 

What question does such a natural experiment answer? Conditionally 
given that a woman with a certain prior history of fertility is currently preg¬ 
nant, having a girl or a boy or twins does not pick out a particular type of 
woman. So the study is accepting whatever process led a particular woman 
to be pregnant at a certain moment in her life, and it is asking: What would 
happen if she unexpectedly had two children at that pregnancy rather than 
one? How would that event alter her subsequent workforce participation? 
We use the idea from Angrist and Evans (1998) to illustrate and discuss 
tools to extract natural experiments from larger biased data sets, in par¬ 
ticular, risk set matching [Li, Propert and Rosenbaum (2001)], differential 
effects [Rosenbaum (2006, 2013a)] and strengthening an instrumental vari¬ 
able [Baiocchi et al. (2010), Zubizarreta et al. (2013)]. 

1.3. Informal review of two key concepts: Differential effects; risk-set 
matching. Because differential effects and risk set matching may be un¬ 
familiar, we now review the motivation for these techniques. Consider, first, 
differential effects and generic biases acting at a single point in time [Rosen¬ 
baum (2006, 2013a)]. Treatment assignment may be biased by certain un¬ 
measured covariates that promote several treatments in a similar way. When 
this is true, receiving a treatment s may be very biased by these covariates, 
while receiving one treatment s in lieu of another s' may be unbiased or 
less biased or biased in a different way. Here, attention shifts from whether 
or not a person received treatment s (i.e., the main effect of s ) to whether 
a person received treatment s rather than treatment s' conditionally given 
that the person received either treatment s or treatment s' (i.e., the differ¬ 
ential effect of s in lieu of s'). Consider an example discussed in detail by 
Anthony et al. (2000). There is a theory that nonsteroidal anti-inflammatory 
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drugs (NSAIDs), such as ibuprofen (e.g., brand Advil), may reduce the risk 
of Alzheimer disease. There is an obvious bias in comparing people who reg¬ 
ularly take ibuprofen and people who do not. In all likelihood, a person who 
regularly takes ibuprofen is experiencing chronic pain, perhaps arthritis or 
back pain, is aware of that pain, and is capable of acting deliberately on 
the basis of that awareness. It has been suggested that people in the early 
undiagnosed stages of Alzheimer disease are less aware of pain and less able 
to act on what awareness they have, so that fact alone might produce a spu¬ 
rious association between use of ibuprofen and lower risk of later diagnosed 
Alzheimer disease. There are, however, pain relievers that are not NSAIDs, 
for example, acetaminophen (e.g., brand Tylenol). While limited awareness 
of pain or limited ability to act on awareness might reduce use of pain re¬ 
lievers of all kinds, it seems far less plausible that it shifts people away 
from ibuprofen and toward acetaminophen. That is, the differential effect 
of acetaminophen-versus-ibuprofen—of one treatment in lieu of the other— 
may not be biased by unmeasured covariates that would bias straightforward 
estimates of the main effect of either drug. Differential effects are not main 
effects, but when differential effects are interesting, they may be immune to 
certain biases that distort main effects. See also Gibbons et al. (2010) for 
differential effects in the study of medications. 

Consider, second, risk-set matching, a device for respecting the temporal 
structure of treatment assignment in observational studies [Li, Propert and 
Rosenbaum (2001)]. For each subject in a randomized experiment, there is 
a specific moment at which this subject is assigned to treatment or to con¬ 
trol. In some observational studies, there is no corresponding moment. Some 
people receive treatment at a specific time, others receive it later or never 
receive it, but anyone who does not receive treatment today might receive 
it tomorrow. Risk-set matching pairs two individuals at a specific time, two 
individuals who looked similar in observed covariates prior to that specific 
time, a time at which one individual was just treated and the other was 
not-yet-treated. The not-yet-treated individual may be treated tomorrow, 
next year or never. We compare two individuals who looked similar prior to 
the moment that one of them was treated, avoiding matching or adjustment 
for events subsequent to that moment [cf. Rosenbaum (1984)]. That is, in 
the language of Cox’s proportional hazards model, risk-set matching pairs 
two individuals who were both at risk of receiving the treatment a moment 
before one of them actually received it, two individuals who looked similar 
in time-dependent covariates prior to that moment. Taken alone, without 
differential comparisons, risk-set matching is a method for controlling mea¬ 
sured time-dependent covariates respecting the temporal structure of treat¬ 
ment assignment; see van der Laan and Robins (2003) for other methods for 
this task. 
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1.4. Outline of the paper. Section 2 discusses new relevant theory, specif¬ 
ically theory linking risk-set matching for time-dependent measured covari¬ 
ates with differential comparisons unaffected by certain unmeasured time 
dependent covariates. Fertility is commonly modeled in terms of “event his¬ 
tory” or point process models determining the timing of events together with 
“marks” or random variables describing these randomly timed events. The 
mark may record the occurrence of twins. Temporal order is key and must 
be respected. Sections 3 and 4 complete the case study of twin births with 
the construction of the matched sample using combinatorial optimization for 
risk-set matching discussed in Section 3 and a detailed analysis presented in 
Section 4. Section 5 includes a discussion of related work. 

2. Risk-set matching to control generic unmeasured biases. 

2.1. Notation for treatments over time. The population before matching 
contains statistically independent individuals. At time t , individual l has a 
history of events prior to t , the observed history being recorded in x& and 
the unobserved history being recorded in To avoid a formal notation 
that we would rarely use, we write histories as variables, or upt, but 
we intend to convey a little more than this. Both the quantity and types 
of information in x& or in upt or in (x^,Uft) increase as time passes, that 
is, as t increases [or, formally, the sigma algebra generated by (xft,%) is 
contained within the sigma algebra generated by (x^, upy ) for t <t']. 

In our case study, x# records such things as the ages at which mother £ 
gave birth to the children she had prior to time t, her years of education 
attained at the times of those births before time t, and unchanging charac¬ 
teristics such as her place of birth, race or ethnicity. In parallel, upp might 
be an unmeasured quantity reflecting the entire history of a woman’s incli¬ 
nation to work full time in the year subsequent to time t. Obviously, a birth 
at time t might, often would, alter xp t i or upjj for t' > t. 

There is also a treatment process Zp t that is in one of K + 1 states, so, 
s±,... ,sk- That is, at any time t, individual £ is in exactly one of these 
states, Zp t = Sk for some k £ {0,1,..., K}. Also, write Zp t for the history of 
the Zp t process strictly prior to time t, so Zp t records Zp t i for t! < t but it 
does not record Zp t . In our case study, state so is the interval state of not 
currently giving birth to a child, state si is the point state of giving birth 
to a single female child, state S 2 is the point state of giving birth to a single 
male child, state S 3 is the point state of giving birth to a pair of female twins 
and so on. Most women are in state Zp t = so at most times t. The history 
Zpt records mother Os births up to time t, their timing, the sex of the child, 
twins, etc. 

Consider a specific individual £ at a specific time t. At this moment, the 
individual has a treatment history Zp t prior to t and is about to receive a 
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current treatment Zgg. Given the past, Za. we are interested in the effect of 
the current treatment Zg t on some future (i.e., after t ) outcome R g. Write 
Rfj = (Zg t ,x.g t ,ugt) for the past at time t. In parallel with Neyrnan (1923) 
and Rubin (1974), this individual £ at this time t has K + 1 possible val¬ 
ues for Rg depending upon the treatment Zg t assigned at time t, that is, 
Ri = rki if Zm = Sk, where only one Rg is observed from an individual, and 
the effect of giving treatment k rather than k' at time t, namely, rkt — ryg 
is not observed for any person at any time. This structure is for individual £ 
at a specific time t with treatment history Zg t ; typically, everything about 
this structure would change if the history Zgt to time t had been different. 
The question is what effect treatment at time t has on an individual with a 
specific treatment and covariate history prior to t. It is entirely possible— 
indeed, in typical applications, it is likely—that the treatments Zgf at times 
t' < t alter the value of observed or unobserved subsequent history (x.g t ,ug t ), 
but the history at t, namely, (xa,%), records the situation just prior to 
t and hence is unaffected by the treatment assignment Zg t at t. Quite of¬ 
ten, the outcome Rg is a future value of a quantity that is analogous to a 
past quantity recorded in the history (xg t ,ug t ). In our case study, Rg might 
measure an aspect of future workforce participation beyond time t where 
(xgtjUgt) records workforce participation prior to time t, or Rg might mea¬ 
sure educational attainment at some time after t where (xg t ,ugt) records 
educational attainment prior to time t. 

In our case study, aspects of the record of a woman’s fertility, Zg t , are 
likely to be strongly predicted by aspects of her observed and unobserved 
histories ( x.g t ,ug t ). A woman £ aged t' = 18 years whose private aspiration 
ugt is to earn a Ph.D. in molecular biology and an MBA and to start her own 
biotechnology company is likely to take active steps to ensure Zg t = so for 
t € (18,22] or longer, that is, she is likely to postpone having children for at 
least several years. In contrast, another woman £' whose private aspiration 
ugn at age t! = 18 is to stay at home as the mother of many children may 
take active steps to ensure Zg t ^ s o for several t £ (18,22], that is, she may 
actively pursue her goal of a large family. A comparison of the workforce 
participation of woman £ and woman £! will be severely biased as an estimate 
of the effects of having a child before age 22 on workforce participation, 
because £ tried to shape her fertility to fit her work plans and £! tried to 
shape her fertility to fit her family plans—even if, by some accident, they 
had the same pattern of fertility over t G (18,22], we would not be surprised 
to learn that £ subsequently worked more for pay than did £'. What is an 
investigator to do when unmeasured aspirations, intentions and goals are 
strongly associated with treatment assignment? 

2.2. What is risk-set matching? Risk-set matching compares people, say, 
£ and £', who received different treatments at time t , Zg t ^ Zgt t , but who 
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looked similar in their observed histories prior to t, x& = and Zn = Zm\ 
see Li, Propert and Rosenbaum (2001), Lu (2005) and Rosenbaum [(2010), 
Section 12]. Importantly, i and i' are similar prior to t in terms of observable 
quantities that may be controlled by matching, but they may not be similar 
in terms of unmeasured histories, u # ^ u^t, and of course they may differ in 
the future, after time t, not least because they received different treatments 
at time t. Risk-set matching does not solve the problem of unmeasured his¬ 
tories. Risk-set matching does respect the temporal structure of the data, 
avoiding adjustment for variables affected by the treatment [Rosenbaum 
(1984)]. Risk-set matching also “simplifies the conditions of observation,” 
to use Mervyn Susser’s [(1973), Section 7] well-chosen phrase, ensuring that 
comparisons are of people with histories that look comparable, even though 
those histories may be of different lengths, and hence may contain qualita¬ 
tively different information. Although individuals have histories of different 
lengths containing qualitatively different information, matched individuals 
have histories of the same length. For instance, a woman giving birth to her 
3rd child has in her history ages of birth of her first three children, where a 
mother giving birth to her second child does not have in her history her age 
at the birth of her third child, if indeed she had a third child. 

In implementing risk-set matching in Section 3, we match women of the 
same age, with the same history of fertility—the same numbers of prior 
children born at the same ages in the same patterns. We also control for 
temporally fixed quantities associated with fertility, such as ethnicity. A 
delicate issue that risk-set matching would straightforwardly address with 
adequate data is “education.” On the one hand, education is strongly related 
to wage income and is related to employment, so it may strongly predict 
certain workforce outcomes Rg. On the other hand, education may itself be 
affected by fertility: a mother who has her first child at age 16 may as a 
consequence have difficulty completing high school. In principle, the issue is 
straightforward with risk-set matching: in studying the effects of fertility Z( t 
at time t, one compares two people who had the same education prior to t, 
without equating their educations subsequent to time t. Again, this avoids 
adjustment for variables affected by the treatment [Rosenbaum (1984)]. If 
the adjustment for education at time t controlled for subsequent education 
at time t' > t, it might—probably would —remove a substantial part of the 
actual effect on workforce participation of having a child at age 16. Not 
finishing high school is a good way to have trouble in the labor market, and 
having a child at age 16 is a good way to have trouble finishing high school; 
everyone remembers this until they start running regressions, but then, too 
often, part of an actual effect is removed by adjusting for a posttreatment 
variable that was also affected by the treatment. 

Risk-set matching was discussed by Li, Propert and Rosenbaum (2001) 
and Lu (2005). It has been applied in criminology [Nieuwbeerta, Nagin and 
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Blokland (2009), Apel et al. (2010), Murray, Loeber and Pardini (2012)], 
sociology [Wildeman, Schnittker and Turney (2012)] and medicine [Kennedy 
et al. (2010)]. See Marcus et al. (2008), Rosenbaum [(2010), Section 12], 
Stuart (2010) and Lu et al. (2011) for related discussion. 

2.3. Removing generic unmeasured biases by differential comparisons in 
risk sets. The model for biased treatment assignment in risk-set match¬ 
ing is intended to express the thought that matching for the observed past, 
(Z#,x#), has controlled for the observed past but typically did not control 
for the unobserved past un- The model is a slight generalization to multiple 
states of the model for two states in Li, Propert and Rosenbaum [(2001), 
Section 4], and that model is itself closely patterned after Cox’s (1972) pro¬ 
portional hazards model for outcomes rather than treatments. People are 
in state so almost all the time, and are in states si,...,s^ only at points 
in time. Let A ki^et) = ^k(Zit, x^, un) be the hazard, assumed to exist, of 
entering state k > 1 at time t given past Fit • The hazard is assumed to be 
of the form A k (Z tt ,^u,uu) = exp {n k {Z a ,^u) + faun} where «*,(-, •) is un¬ 
known. Because x# may include as one of its coordinates the time t, this 
model permits the hazards to vary with time t. For state so, it is notationally 
convenient to define Ao(•,•,•) = 1 and (f >o = 0. 

In Section 2.1, uu was described as a possibly multivariate history of a 
possibly continuous process in time, whereas in the hazard model, 
exp{nk(Zu, x-et) FfikUet}, the unobserved element has become a scalar. This 
seems at first to be an enormous and disappointing loss of generality, but 
upon reflection one sees that the loss is not great. Suppose did record a 
multivariate history over time, and consider the hazard model exp {nk(Zit, 
x it) + 4 > kf( u it)} where /(•) is some unknown real-valued functional of that 
multivariate, temporal history. Although this appears at first to be a more 
general model, writing ua = /(?%), the model becomes exp {Kk(Zg t ,xg t ) + 
(fkUu }, a scalar model essentially as before. In words, in exp{fq,(Zft,Xft) + 
4>kf(u it )}, not knowing and not knowing /(•) is no better and no worse 
than not knowing the scalar un = /(?%). It is the impact of unmeasured 
history on the hazard—a scalar—that matters, not the particulars of that 
history. See Li, Propert and Rosenbaum (2001) and Lu (2005) for related 
discussion. 

Let s £ {si,..., sk} be one of the point states or birth outcomes (single 
girl, etc.), and let s' / s be any one of the other states, s' £ {so, si, • • • , sk}- 
Here, s' may be either the state so of not giving birth or a point state. 
Suppose that we form a risk-set match of one individual with Zft = s and 
J — 1 > 1 other individuals i' in state s' at t, where all J individuals have 
the same observed history to time t, Zet = Zpt and ~x.a = For instance, 
this might be a match of J women with the same observed history to time 
t, one of whom gave birth to her first child at t, a single girl si, where the 
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other J — 1 women had had no child up to and including time t. Despite 
looking similar prior to time t, it is possible, perhaps likely, that these J 
women differed in their ambitions for school or work. After all, one 
had a child at time t while the others did not. Alternatively, the matching 
might compare a woman who had her first child, a girl or point state si, 
at time t to J — 1 women with the same observable past who had a first 
child, a boy or point state S 2 , at time t. Perhaps this second comparison is 
closer to random than the previous comparison of women with and without 
children at time t , because now all J women had their first child at time t, 
and it was only the sex of the child that varied. Obviously, there are many 
analogous possibilities, but we suppose the investigator will focus on one such 
comparison at a time, for now, s and s' with s/s' and s, s' £ {-So,..., sa}- 
The risk-set match is built rolling forward in time t , matching women with 
states s or s' at t and with identical observable pasts, (Zft,xa), possibly dif¬ 
ferent unobservable pasts u&, removing individuals once matched; however, 
events subsequent to time t are not used in matching at time t. In the end, 
there are I nonoverlapping matched sets, each containing J individuals. It 
is notationally convenient to replace the label £, where £ does not indicate 
who is matched to whom, by noninformative labels for sets, i = and 

for individuals within sets, j = 1 ,..., J, for instance, random labels could 
be used. We then have Zij t = Zijn and x^ = for all i, j, j', but pos¬ 
sibly mjt ± Uij't . Also, write F it = {Z iU , x iU , u iU ,..., Z iJt ,Xij t ,Uijt). Let Z 
be the event that for each i, exactly one individual j has Z^t = s and the 
remaining J — 1 individuals j' have Z t jn = s', so the risk-set matched design 
ensures that Z occurs. Given Z, the time t is fixed, and the two states, s 
and s', are fixed, so it is convenient to write Zij = 1 if Z^t = s and = 0 
if Z^t = s', so that 1 = zu/=i ^ij f° r each i. 

The next step is key. Although there are (^*) possible choices of two 
states s, s' £ {so> • - •, sa} to compare by risk-set matching, the same unob¬ 
served covariate Uijt can severely bias some choices of two states, while oth¬ 
ers may be nearly random or only slightly biased. Consider the conditional 
probability that, in set i of this risk-set matched design, it is individual 
j who received treatment s, with Z^t = s, the remaining J — 1 individu¬ 
als receiving_treatment s'. Using (i) X k (Zi jt ,x ijt ,Uij t ) = exp{K k (Z ijt , x ijt ) + 
( t ) k'Uijt }i (h) Z^t — Z m and Xjjj — and (iii) T 

J2j'= 1 4>s'Uij't yields 
Pr {Zij t = s\F it ,Z) 

(Z ijt , yi-ijt ) (Z ij'ti X ij't ) H - 

X^m=l imt 5 X imt ) Y\m'^m ^Xp{^s' (Zim't •> X im't ) H” ^s'^irn't} 

s^ijt H - 

Em=l ex P( 0 s' W 'imt + 


(1> 
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exp{(</>s - 4>s')uij t } 
El=l ex P{(^ - (!>s’)Uimt} 

= / xp(7 " w) where 7 = * 

Em=l eX P(7Wimt) 


Vs' 5 


where the last expression ( 1 ) is the same as the sensitivity analysis model in 
Rosenbaum (2007, 2013b) for comparing treatment and control in / matched 
sets. 

The key point is that there may be reason to believe that \4> s — <p s '\ is 
small for some choices of s, s ', and large for other choices. Refraining from 
having a child, s = 0 , is often a carefully planned event, but whether a child 
is a boy or a girl, twins or a single birth, is a much more haphazard event. 
Some comparisons are expected to be less biased by unmeasured intentions 
and preferences than other comparisons. If a careful choice of s, s' implies 
that |y| = \4> s — 4> s i\ is small, then the inference about treatment effects may 
be convincing if it is insensitive to small biases |y| even if it is sensitive to 
moderate biases. If (f> s — 4> s i = 0, then (1) is the randomization distribution, 
Pr (Zijt = s\J~it,Z) = 1 / J for each ijt ; moreover, this is true even if <f s and 
cfi s i are large, so that comparing mothers who had children at different times 
would be severely biased by Uijt- 


2.4. Sensitivity analysis for any remaining differential biases. If 4> s ^ 
tp s >, but |y| = | cj) s — (f s /\ is small in ( 1 ), then the differential comparison of 
treatments s and s' in ( 1 ) may still be biased by u^t , and the sensitivity 
analysis examines the possible consequences of biases of various magnitudes 
7 . In the current paper, the sensitivity analyses use (1) with a test statistic 
that is either the mean difference in workforce participation or a correspond¬ 
ing M-estimate with Huber’s weights. Of course, the mean difference is one 
particular form of M-estimate. The sensitivity analysis was implemented as 
described in Rosenbaum (2007) with the restriction that u^t £ [0,1], so that 
under ( 1 ) matched mothers may differ in their hazards of birth outcome s 
versus s' by at most a factor of T = exp(y). In the comparison in Section 4, 
this means that two mothers with the same pattern of fertility and observed 
covariates to time t, both of whom gave birth at time t , may differ in their 
odds of having a twin, s, rather than a single child of a different sex than her 
earlier children, s', by at most a factor of T because of differences in the un¬ 
measured Uij. Although biases of this sort are not inconceivable, perhaps as 
a consequence of differential use of abortion or fertility treatments, presum¬ 
ably such a bias T is not very large, much smaller than the biases associated 
with efforts to control the timing of births. The one parameter T may be 
reinterpreted in terms of two parameters describing treatment-control pairs, 
one A relating to the outcome ( rTij,rcij ), the other A relating u^j to 
the treatment Zjj , such that a single value of T corresponds to a curve of 
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values of (A, A) defined by T = (AA + 1)/(A + A), so a brief unidimensional 
analysis in terms of T may be interpreted in terms of infinitely many two- 
dimensional analyses in terms of (A, A); see Rosenbaum and Silber (2009). 
For instance, the curve for T = (AA + 1)/(A + A) = 1.25 includes the point 
(A, A) = (2,2) for a doubling of the odds of treatment and a doubling of the 
odds of a positive pair difference in outcomes. Hsu and Small (2013) show 
how to calibrate a sensitivity analysis about an unobserved covariate using 
the observed covariates. 

What is the role of the restriction u^t £ [0,1]? The restriction Uijt £ [0,1] 
gives a simple numerical meaning to 7 and T by fixing the scale of the unob¬ 
served covariate: in (1), two subjects may differ in their hazard of treatment 
s rather that treatment s' at time t by at most a factor of T because they 
differ in terms of u^t■ It is possible to replace the restriction that u^t £ [0,1] 
for all ijt by the restriction that u^t £ [0,1] for, say, 99% of the ijt with 
the remainder unrestricted [Rosenbaum (1987), Section 4]; however, when 
using robust methods, small parts of the data make small contributions to 
the inference, so this replacement has limited impact. Permitting 1% of the 
u^t be unrestricted should count as a larger bias, in some sense a larger 
7, and Wang and Krieger (2006) explore this possibility in a special case, 
concluding that binary u^t do the most damage among all u^t with a fixed 
standard deviation. 

For discussion of a variety of methods of sensitivity analysis in observa¬ 
tional studies, see Baiocchi et al. (2010), Cornfield et al. (1959), Diprete 
and Gangl (2004), Egleston, Scharfstein and MacKenzie (2009), Gastwirth 
(1992), Hosrnan, Hansen and Holland (2010), Li, Propert and Rosenbaum 
(2001), Lin, Psaty and Kronmal (1998), Liu, Kuramoto and Stuart (2013), 
Marcus (1997), McCandless, Gustafson and Levy (2007), Robins, Rotnitzky 
and Scharfstein (2000), Rosenbaum (2007, 2013b), Small (2007), Small and 
Rosenbaum (2008) and Yu and Gastwirth (2005). 

2.5. What is isolation? Isolation refers to equation (1) and is motivated 
by the possibility that \4> s — (j) s >\ may be small or zero when neither (j) s nor 
(j) s i is small or zero. If <j) s is not small, receipt of treatment s rather than 
no treatment will be biased by the unmeasured time-dependent covariate 
u^t ■ In parallel, if <j) 8 t is not small, receipt of treatment s' rather than no 
treatment will be biased by u^t- However, if cf> s = (j> s i, then the differential 
comparison of treatments s and s', conditionally given one of them, will not 
be biased by u^t, even though (f> s and (j> s i may both be large. If unmeasured 
aspirations and plans ( u^t ) influence the timing of fertility but not whether 
twins (s) or a single child (s') is born, then a comparison of two mothers 
with the same timing, one with twins, the other with a single child, is not 
biased by the unmeasured aspirations and plans. Equation (1) isolates biased 
timing from possibly unbiased birth outcomes given timing. The sensitivity 
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analysis considers the possibility that | (f) s — (f> s '\ is small but not zero, so 
there is some differential bias. 

In the case study, it seems likely that the timing of births is affected 
by unmeasured covariates Uijt but, conditionally given a birth, specific birth 
outcomes are close to random; that is, each (f> s is not small but each \(j> s — 4> s i \ 
is small. In some other context, it might be that | (f> s — (f> s / | is thought to be 
small for some pairs s, s' € {1,... ,K} and not for others, and, in this case, 
attention might be restricted to a few comparisons for which \(j) s — (j) s /\ is 
thought to be small. 

No matter how deliberate and purposeful a life may be, there are brief 
moments when some consequential aspect of that life is determined by some¬ 
thing haphazard. Isolation narrows the focus in two ways: the moment and 
the aspect. One compares people who appeared similar a moment before 
luck played its consequential role. Among such people, one considers only a 
consequential aspect controlled by luck. Isolation refers to the joint use of 
risk-set matching to focus on a moment and differential effects to focus on 
an aspect. 

2.6. Selecting strong but haphazard comparisons. To emulate a random¬ 
ized experiment, a natural experiment should have a consequential difference 
in treatments determined by something haphazard. The strongest contrast is 
twins at birth k versus mixed sex children at birth k , because this comparison 
is expected to do the most to shift the number of children. The population 
of pregnant women would not be distorted by limiting attention to these 
two groups, providing that the unobserved u^t affects the timing but not 
the outcome of pregnancies (i.e., providing (p s = cj) s i for s, s' £ {1 ,..., K}). 

Natural experiments may yield instrumental variables where “strong” 
refers to the strength of the instrument. An instrument is a haphazard 
nudge to accept a higher dose of treatment, where the nudge affects the 
outcome only if it alters the dose of treatment, the so-called “exclusion re¬ 
striction”; see Holland (1988) and Angrist, Imbens and Rubin (1996). In 
Section 2.3, some patterns of births (e.g., twins) may induce women to have 
more children than they would have had with a different pattern of births, 
so perhaps certain patterns are instruments for family size (the dose). An 
instrument is weak if most nudges are ignored, rarely altering the dose. An 
instrument is strong if it typically materially alters the dose. Weak instru¬ 
ments create inferential problems with limited identification [Bound, Jaeger 
and Baker (1995), Imbens and Rosenbaum (2005), Small (2007)] and, more 
importantly, inferences based on weak instruments are invariably sensitive to 
tiny departures from randomized assignment [Small and Rosenbaum (2008)]. 
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Therefore, it is often advantageous to strengthen an instrument [Baiocchi 
et al. (2010), Zubizarreta et al. (2013)]. 

Is the exclusion restriction plausible here? Perhaps not. The exclusion 
restriction would mean that having twins affects workforce participation 
only by altering the total number of children. If a mother wanted three 
children but had twins at her second pregnancy, the occurrence of twins 
might have altered the timing of her children’s births rather than the total 
number of children. A mother who wished to stay at home until her three 
children had entered kindergarten might return to work sooner because of 
twins at the second birth without altering her total number of children, and 
in this case the exclusion restriction would not be satisfied. 

Even if the exclusion restriction does not hold, so the natural experi¬ 
ment does not yield an instrument, it is nonetheless advantageous to have 
a consequential difference in treatments determined by something that is 
haphazard. In particular, the Wald estimator commonly used with instru¬ 
mental variables estimates a ratio of treatment effects—a so-called effect 
ratio—when the exclusion restriction does not hold. The effect ratio is a 
local-average treatment effect when the exclusion restriction holds, but it is 
interpretable without that condition; see Section 4 and Baiocchi et al. (2010) 
for further discussion. 

A distinction is sometimes made between internal and external validity, a 
distinction introduced by Donald T. Campbell and colleagues, a distinction 
that Campbell (1986) later attempted to revise. In revised form, internal 
validity became “local causal validity,” meaning correct estimation of the 
effects of the treatments actually studied in the populations actually stud¬ 
ied. What had been external validity separated into several concepts, each 
referring to some generalization, perhaps from the treatments under study to 
other related treatments, from the populations under study to other related 
populations, or from the outcome measures under study to other related 
measures. Because it uses Census data from 1980, Angrist and Evans’ (1998) 
study concerns of a well-defined population at a particular era in history, 
and results about women’s workforce participation might easily be different 
in the US in earlier and later eras. It would be comparatively straightfor¬ 
ward to replicate their study using Census data from other eras or using 
similar data in other countries. Their study is reasonably compelling as a 
study of the effects of having twins rather than a single child but, as the 
discussion of the exclusion restriction above makes clear, it is not certain 
that having twins has the same effect on workforce participation as having 
two children at different times. Moreover, the study provides no information 
about women who have no children at all. In brief, twinning is typically an 
unintended and somewhat random event, whereas many women attempt to 
carefully, thoughtfully and deliberately control the timing of fertility, so An¬ 
grist and Evan’s study has unusual strengths in local causal validity, but one 
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needs to avoid extrapolating their findings to other eras or types of fertility 
that they did not study. 

3. The risk-set match. 

3.1. One matched risk set. We created nonoverlapping matched sets of 
6 women who were similar prior to the birth of their kth child, for k = 2, 
3, 4, one of whom had a twin on this kth birth, whereas the others had 
children of both sexes as of the kth. child. For instance, matched set #836 
consisted of six women. All six women had their first child at age 18 and 
their second child at age 22, and all were white. After the birth of the second 
child, five of the mothers had one boy and one girl, and one of the mothers 
had twins at the second pregnancy. A mother’s plans for education, career 
and family may easily influence the timing of her pregnancies, but these six 
women gave birth at the same ages. A mother’s plans for education, career 
and family are much less likely to determine which of the six pregnancies will 
end with twins and which will end with two children of different sexes—for 
most mothers, that’s just luck. All six mothers had 12 years of education at 
the time of their first and second births at ages 18 and 22, respectively; see 
Section 3.2 for technical details about this statement. 

Matched sampling controls, or should control, for the past, not the fu¬ 
ture [Rosenbaum (1984)]. The six women were similar prior to their second 
pregnancy. They had different outcomes at their second pregnancy. What 
happened subsequently? The woman with twins ended up with 3 children 
in total, the other five woman ended up with two children each—that is, 
none of these women went on to have additional children beyond their sec¬ 
ond pregnancy. The pattern is different in other matched sets. In this one 
matched set, all six women had no additional education beyond the 12 years 
they had at age 18, the age of their first birth. In this particular matched 
set, the mother of twins ranked third in workforce participation. In the year 
prior to the 1980 Census, two of the women with two children had worked 
at least 40 hours in the previous week and 52 weeks in the previous year, 
while the remaining three women with two children had not worked at all in 
the previous year. The woman with twins, with three children, had worked 
40 hours in the previous week and 20 weeks in the previous year. 

Matched sets varied, but set #836 was typical in one respect. In the 
matched comparison, it was uncommon for women who had children by 
age 18 to ultimately complete a BA degree—only 5.5% did so—whereas it 
was much more common for women who did not have a child by age 18 to 
complete a BA degree—28.2% did so. Total lifetime education is the sum of 
two variables, a covariate describing education prior to the kth birth and an 
outcome describing additional education subsequent to the kth birth. Risk- 
set matching entails matching for the covariate—the past—but not for the 
outcome—the future. 
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3.2. Technical detail: How the matching was done. Matches were con¬ 
structed in temporal order, beginning with the second pregnancy. Moth¬ 
ers not matched at the second pregnancy might be matched later. The 
matching was exact for three variables—age category at the second preg¬ 
nancy, race/ethnicity and region of the US; see Table 1. Within each of 
these 64 = 4 3 cells, the match solved a combinatorial optimization prob¬ 
lem to make the mother of twins similar to the five control mothers in the 
same matched set. Similarity was judged by a robust Mahalanobis distance 
[Rosenbaum (2010), Section 8.3] using observed covariates x# prior to this 
pregnancy. Forming nonoverlapping matched sets to minimize the sum of 
the treated-versus-control distances within sets is a version of the optimal 
assignment problem, and it may be solved using the pairmatch function of 
Hansen’s (2007) optmatch package in R. [We used mipmatch in R available at 
http://www-stat.wharton.upenn.edu/~josezubi/; see Zubizarreta (2012).] 

From the Census data, we can know the education of the mother prior to 
the Census, her age at the Census and the ages of her children, and from 
this we can deduce her ages at the births of her children. Ideally, we would 
know exactly her years of education at the birth of each of her children, 
but the Census provides slightly less information. The norm in the US is to 
complete high school with 12 years of education at age 18. If a woman had a 
total of E years of education at the time of the census and if she was age A 
at her kth pregnancy, we credited her with min(U, A — 6) years of education 
at her kth. pregnancy. For instance, a woman who had a BA degree with 16 
years of education and a first child at age 26 was credited with 16 years of 
education at the birth of her first child. This is a reasonable approximation 
but will err in some cases. The exact timing of education is available in some 
longitudinal data sets. 

3.3. Covariate balance prior to the kth birth in the risk-set match. Fig¬ 
ures 1 and 2 show the balance on age at each pregnancy and education 
at each pregnancy. The match at the second pregnancy should balance age 
and education at the first two pregnancies, viewing subsequent events as 
outcomes. The match at the third pregnancy should balance age and educa¬ 
tion at the first three pregnancies, viewing subsequent events as outcomes. 
The match at the fourth pregnancy is analogous. Figures 1 and 2 show the 
desired balance was achieved. 

Tables 1 and 2 show the comparability of the matched groups separately 
for the matches at the second, third and fourth pregnancy. Table 1 exhibits 
perfect balance for categories of race/ethnicity, region of the US and age at 
the second pregnancy. Moreover, the interactions of these three variables are 
also exactly balanced. 
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Twin/Different Sex 
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Fig. 1. Age at births in 5040 1-5 nonoverlapping matched sets containing 30,240 moth¬ 
ers, specifically 5040 mothers who gave birth to a twin at the indicated pregnancy and 
25,200 mothers who had at least one child of each sex by the end of the indicated pregnancy. 
For 3380 sets matched at the second pregnancy, matching controlled the past, namely age 
at the first and second births. For 1358 sets matched at the third pregnancy, matching 
controlled the past, namely age at the first, second and third births. For 302 sets matched 
at the fourth pregnancy, matching controlled the past, namely age at the first, second, third 
and fourth births. 


4. Inference: Tobit effects, proportional effects, sensitivity analysis. Fig¬ 
ure 3 depicts two outcomes recorded on Census day for the 30,240 mothers 
in 5040 matched sets, each set containing one mother who had a twin at 
the indicated pregnancy and 5 mothers who had at least one child of each 
sex at the indicated pregnancy. One outcome is the total number of children 
recorded on Census day. The other outcome is the work fraction where 0 
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Fig. 2. Mother’s education at the time of various births in 5040 1-5 nonoverlapping 
matched sets containing 30,240 mothers, specifically 5040 mothers who gave birth to a 
twin at the indicated pregnancy and 25,200 mothers who had at least one child of each 
sex by at the end of the indicated pregnancy. Each match controls the past, not the future. 
For graphical display in the boxplots, education is truncated at 6 years despite a few values 
below that. 


indicates no work for pay and 1 indicates full time work (> 40 hours per 
week). The work fraction is the number of weeks worked in the last year 
multiplied by the minimum of 40 and the number of hours worked in the 
last week, and then this product is divided by 40 x 52 to produce a number 
between 0 and 1. (A small fraction of mothers worked substantially more 
than 40 hours in the previous week.) 
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Table 1 

In each matched risk set containing J = 6 mothers, a mother of a twin at birth k is 
matched to J — 1 = 5 control mothers whose kth birth was a single child whose sex was 
different from one of her previous children. The matching was exact for four age 
categories, for four race/ethnicity categories and for four regions of the US, and because 
it was exact, it controlled their interactions. The table displays counts and percents, 
where the count for controls is always five times the count for twins. Only one column of 
percents is displayed because the percents in the two groups are identical 




2nd birth 



3rd birth 



4th birth 


Covariate 

Twin 

Control 

% 

Twin 

Control 

% 

Twin 

Control 

% 

Age 
< 18 

182 

910 

Mother’s age 
5 167 

at her second 
835 

pregnancy 

12 63 

315 

21 

19-22 

1239 

6195 

37 

677 

3385 

50 

163 

815 

54 

23-25 

1044 

5220 

31 

350 

1750 

26 

63 

315 

21 

> 26 

915 

4575 

27 

164 

820 

12 

13 

65 

4 

Race / ethnicity 
Black 

505 

2525 

15 

Mother’s race/ethnicity 
242 1210 18 

81 

405 

27 

Hispanic 

87 

435 

3 

63 

315 

5 

11 

55 

4 

White 

2707 

13,535 

80 

1023 

5115 

75 

203 

1015 

67 

Other 

81 

405 

2 

30 

150 

2 

7 

35 

2 

Region 

Northeast 

685 

3425 

20 

Region of the US 

270 1350 20 

60 

300 

20 

South 

1081 

5405 

32 

426 

2130 

31 

100 

500 

33 

Central 

988 

4940 

29 

391 

1955 

29 

93 

465 

31 

West 

626 

3130 

19 

271 

1355 

20 

49 

245 

16 


In the top half of Figure 3, at the second pregnancy, a twin birth shifted 
upward by about 1 child the boxplot of number of children. The shift is 
smaller at the third and fourth pregnancies, where the lower quartile and 
median increase by 1 child, but the upper quartile is unchanged. Presumably, 
some mothers pregnant for the third or fourth time intend to have large 
families and twins did not alter their plans. In the bottom half of Figure 3, 
mothers of twins worked somewhat less, but the difference in work fraction 
is not extremely large. Figure 4 displays the information about work fraction 
in a different format, as a quantile-quantile plot. 

We consider two models for the effect on the fraction worked, Rij. One 
model is a so-called Tobit effect, named for James Tobin, of twin versus 
different-sex-single-child, Z v] . The Tobit effect has rxij = rnax(0, raj — r) 
and it reflects the fact that a woman’s workforce participation may decline 
to zero but not further. For instance, if r = 0.1 = 10%, then a mother who 
would have worked at least rcij = 10% of full-time without twins would work 
10 % less with twins, rTij = raj — 10%, but a mother who would have worked 
r Cij = 5% or raj = 0% of full-time without twins would not work with twins, 
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Table 2 

Baseline comparison of 30,240 distinct mothers in I = 5040 = 3380 + 1358 + 302 
nonoverlapping matched sets of J = 6 mothers, each set containing one mother who gave 
birth to a twin and J — 1 control mothers who gave birth to a single child whose sex 
differed from that of one of her previous children. The table shows age and education of 
mothers at their various births prior to risk-set matching 


Covariate 

2nd birth 

3rd birth 

4th birth 

Twin 

Control 

Twin 

Control 

Twin 

Control 




Sample size 



# of mothers 

3380 

16,900 

1358 

6790 

302 

1510 



Mother’s age in years, mean 


At the census 

30.4 

30.4 

30.7 

30.7 

31.6 

31.6 

At 1st birth 

20.4 

20.4 

19.5 

19.5 

18.8 

18.8 

At 2nd birth 

23.5 

23.4 

21.8 

21.8 

20.7 

20.7 

At 3rd birth 



25.1 

25.1 

23.5 

23.4 

At 4th birth 





26.7 

26.6 



Mother’s education in years, mean 


At 1st birth 

11.9 

12.0 

11.4 

11.4 

10.8 

10.9 

At 2nd birth 

12.2 

12.2 

11.6 

11.6 

11.0 

11.1 

At 3rd birth 



11.6 

11.6 

11.1 

11.2 

At 4th birth 





11.1 

11.2 



Mother’s education at 1st birth, % 


High school 

43 

43 

42 

42 

32 

33 

Some college 

19 

19 

14 

14 

15 

14 

BA or more 

09 

09 

05 

05 

03 

03 



Mother’s education at 2nd birth, % 


High school 

47 

47 

48 

48 

39 

40 

Some college 

20 

20 

15 

15 

16 

15 

BA or more 

11 

11 

06 

06 

04 

04 



Mother’s education at 3rd birth, % 


High school 



48 

48 

41 

41 

Some college 



16 

16 

16 

16 

BA or more 



06 

06 

05 

05 



Mother’s education at 4th birth, % 


High school 





41 

41 

Some college 





16 

16 

BA or more 





05 

05 


rTij = 0%. For the Tobit effect, we draw inferences about r. If Hq:t = tq 
were true, then max{0, Rij — (1 — Zjj)ro} = rpij does not vary with Z{j 
and satisfies the null hypothesis of no treatment. Therefore, Hq :t = tq is 
the hypothesis of no treatment effect on max{0, Rjj — (1 — Z^tq} and the 
confidence interval is obtained in the usual way by inverting the test. In 
the usual way, the point estimate solves for r an estimating equation that 
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Fig. 3. Two outcomes in 5040 1-5 nonoverlapping matched sets containing 30,240 moth¬ 
ers, specifically 5040 mothers who gave birth to a twin at the indicated pregnancy and 
25,200 mothers who had at least one child of each sex by at the end of the indicated preg¬ 
nancy. The upper boxplots indicate the number of children. The lower boxplots indicate the 
work fraction, defined to be minfhours worked in the previous week, 40 ) x (weeks worked 
in the previous year )/(40 x 52), so a value of 1 is similar to “full time employment.” 


equates the test statistic to its null expectation. We use the treated-minus- 
control mean as the test statistic, but very similar results were obtained 
using an M-estimate with Huber’s weight function trimming at twice the 
median absolute deviation. See Rosenbaum (2007) and the senmwCI function 
in the sensitivitymw package in R for computations. 

Table 3 displays inferences about r, the effect of a twin on hours worked 
or, more precisely, on the work fraction. For T = 1, Table 3 displays ran- 
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2nd Pregnancy 


3rd Pregnancy 


4th Pregnancy 





Control 

3380 1-5 Matched Sets 


Control 

1358 1-5 Matched Sets 


Control 

302 1-5 Matched Sets 


Fig. 4. Quantile-quantile plots of work fraction for twins (vertical) and controls (hori¬ 
zontal) with the line of equality. The plot shows that women with twins were more likely to 
not work, as seen in the horizontal start to the plot, and they worked fewer hours in total, 
as quantiles fall below the line of equality. 


domization inferences assuming the differential comparison of twins ver¬ 
sus different-single-sex-child is free of bias from unmeasured covariates. For 
T > 1, sensitivity to unmeasured bias is displayed. The point estimate of 
r in the absence of bias is 0.0793 or about 8% reduction in work hours 
(0.08 x 40 = 3.2 hours per week) for a mother with twins. More precisely, 
this is an 8% reduction in work fraction or a reduction of 3.2 hours per week 
for any mother who would work at least 3.2 hours if she did not have twins. 
The results are insensitive to small biases, say, T < 1.2, but are sensitive 
to moderate bias, T = 1.25; however, we do not expect much bias in the 
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Table 3 

Inference about the Tobit effect r. For each T, the sensitivity analysis gives the maximum 
possible P-value testing the null hypothesis of no treatment effect, Ho :t = 0, the 
minimum one-sided 95% confidence interval and the minimum possible point estimate. 
Inferences use the mean, but M-estimates with Huber weights produced similar results 


r 

P-value 

95% Cl 

Estimate 

1.0 

1.6 x 10- 13 

t >0.0616 

0.0793 

it 

2.0 x 10 -6 

r > 0.0324 

0.0502 

1.2 

0.0148 

r > 0.0058 

0.0237 

1.25 

0.1512 




differential comparison. As noted in Section 2.3 and Rosenbaum and Sil- 
ber (2009), in a matched pair, treatment-versus-control comparison, a bias 
T = 1.25 is produced by an unobserved covariate that doubles the odds of 
treatment and doubles the odds of a positive treatment-minus-control pair 
difference in outcomes. 

Figure 5 looks at residuals. With tq = 0.0793, Figure 5 plots max{0,i?jj — 
(1 — Zij)ro}. In an infinite sample without bias, this plot would have identical 
pairs of boxplots if the Tobit effect were correct. Though not identical in 
pairs, the boxplots are similar, except perhaps at the 4th pregnancy where 
the sample size is not large. Arguably, the data do not sharply contradict a 
Tobit effect. 

The second model related the effect on workforce participation to the 
effect on the number of children, that is, the two outcomes in Figure 3. Write 
Dij for the number of children, with D,j = dxij if Z,j = 1 and Dij = dctj 
if Zij = 0. The second model says the effect of twin-versus-different-sex- 
single child on the workforce outcome is proportional to the effect on the 
number of children, rxij — rcij = /3 (drij — dctj )■ Under this model, Rij — 
f3Dij = r'Tij — fddTij = Taj — (3dcij does not change with Z^ , so (i) the null 
hypothesis Hq : (3 = [3q is tested by testing the hypothesis of no effect of the 
treatment Z^ on Rij — foDij, (ii) a confidence interval for /3 is obtained in 
the usual way by inverting the test, and (iii) a sensitivity analysis for biased 
Zij is conducted in the usual way; see Rosenbaum (1996) and Irnbens and 
Rosenbaum (2005). This model embodies the exclusion restriction in saying 
that if the twin did not alter the total number of children for mother ij , 
so dpij = dctj , then it did not alter her workforce participation, rnj = rcij- 
For instance, if mother ij had a twin on her second birth, Zij = 1, she 
might have three children, dTij = 3, where perhaps she would have had 
two children if she had had a different-sex-single child at the second birth, 
dcij = 2, so for this mother the twin causes a 1 child increase in her number 
of children, dTij — dcij = 1 , and hence a change in workforce participation 
of rTij — rcij = P(dxij — dcij) = (3. Some other mother, i'j', might have had 
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2nd Pregnancy 


3rd Pregnancy 


4th Pregnancy 





Twin/Different Sex 
3380 1 -5 Matched Sets 


Twin/Different Sex 
1358 1 -5 Matched Sets 


Twin/Different Sex 
302 1 -5 Matched Sets 


Fig. 5. Residuals from the Tobit effect model. The boxplots display 
max{0, Rij — (1 - Za )tq} for to = 0.0793, the point estimate of r at T = 1. In an 
infinitely large sample, if the Tobit model were true with this t and T, then the pair of 
boxplots at each pregnancy would be identical. 


three children regardless, dpij = dcij = 3, in which case the twin caused 
no increase in her number of children, dTij — dcij = 0 so rpij — rctj = 0 . 
Baiocchi et al. (2010) show that randomization inferences (i.e., inferences 
with 7 = cj) s — (f s i = 0 ) for /3 under the model rpij — rctj = f3(drij — dcij) are 
identical to randomization inferences for the effect ratio, (X)j=i S J= i r Tij — 

r dj)/{Yli =l zLj=i dTij ~ d C ij), which is the effect on workforce participation 
per added child, and this is true whether or not the exclusion restriction 
holds. For instance, /3 = —0.1 would be a 0.1 reduction in the average work 
fraction per additional child, whether or not rpij — rctj = /3(dTij — dcij) 
for each individual ij. Without the model rTij ~ raj = td{dpij — dcij ), but 
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Table 4 

Inference about the proportional effect, £. For each T, the sensitivity analysis gives the 
maximum possible P-value testing the null hypothesis of no treatment effect, Ho : /3 = 0, 
the minimum one-sided 95% confidence interval and the minimum possible point 
estimate. Inferences use the mean, but M-estimates with Huber weights produced similar 

results 


r 

P-value 

95% Cl 

Estimate 

1.0 

1.6 x 10~ 13 

£ < -0.0365 

-0.0470 

i.i 

2.0 x 10“ 6 

£<-0.0191 

-0.0296 

1.2 

0.0148 

£ < -0.0034 

-0.0139 

1.25 

0.1512 




with the exclusion restriction, the effect ratio can be interpreted as the 
average effect on workforce participation per child among mothers who had 
additional children because of the twin; see Angrist, Imbens and Rubin 
(1996). 

Table 4 draws inferences about the proportional effect, /3. The test of no 
treatment effect is the same as in Table 3, so the P -values in the two analyses 
are equally sensitive to unmeasured biases. In the absence of unmeasured 
bias, T = 1, the point estimate of f3 suggests a 5% reduction in the work 
fraction per additional child. We have been looking at the effects of twins 
versus the popular mix of children of both sexes. The effects appear to be 
small. 

5. Discussion. Isolation, as we have defined it, is used in the following 
situation. One of several treatments may be inflicted upon individuals (or 
self-inflicted) at certain moments in time. The timing t of treatment may 
be severely biased by both measured and unmeasured time-varying covari¬ 
ates, but there may be two treatments, s and s', such that conditionally 
given some treatment at t, the occurrence of treatment s in lieu of treat¬ 
ment s' is close to random. Isolation focuses attention on that brief moment 
and random aspect by controlling for measured time-dependent covariates 
using risk-set matching and by removing a generic bias using a differential 
comparison. Stated precisely, isolation refers to the radical simplification of 
the conditional probability in (1) that occurs when (j) s = </y; then, the unob¬ 
served time dependent covariate Uijt that would bias most comparisons does 
not bias a risk-set match of treatment s in lieu of s'. This radical simpli¬ 
fication, when it occurs, justifies one very specific analysis: the comparison 
of matched sets with similar observed histories to time t where some indi¬ 
vidual received treatment s and the rest received treatment s'. In the case 
study, the timing of births is biased by a woman’s plans and aspirations for 
education, career and family, but conditionally given a birth at time t, the 
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occurrence of twins rather than a single birth is largely unaffected by her 
plans. 

In a different study that employed similar reasoning, Nagin and Snodgrass 
(2013) examined the effects of incarceration on subsequent criminal activ¬ 
ity. The substantial difficulty is that judges decide in a thoughtful manner 
whether to imprison an individual convicted for a crime. When two people 
are convicted of the same crime, it is far from a random event when one 
is sent to prison and the other is punished in a different way. Nagin and 
Snodgrass looked at counties in Pennsylvania in which some judges were 
much harsher than others, sending many more convicts to prison. Commit¬ 
ting a crime is not haphazard, nor is a judge’s decision, but having your 
case come to trial when judge A rather than judge B is next available is, 
in most instances, a haphazard event. Nagin and Snodgrass contrasted the 
subsequent criminal activity of individuals with similar pasts who were tried 
before harsh judges and those tried before lenient judges in the same county 
at about the same time, so each convict might have received either judge. 
They found little or no evidence in support of the widespread belief that 
harsher judges and harsher sentences reduce the frequency of subsequent 
rear rest. 

A similar strategy is sometimes used in studies of differential effects of 
biologically different drugs used to treat the same disease. The differen¬ 
tial effect may be less confounded than the absolute effect of either drug, 
particularly if the choice of drug is determined by something haphazard. 
For example, Brookhart et al. (2006) compared the gastrointestinal toxicity 
caused by COX-II inhibitors versus NSAIDs by comparing the patients of 
physicians who usually prescribe one versus those who usually prescribe the 
other. See also Gibbons et al. (2010) and Ryan et al. (2012). 
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